2,302 research outputs found

    Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

    Full text link
    Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared 33 probabilistic based clustering methods and 33 distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model based functional data analysis (MFDA), functional clustering models for sparsely sampled data (FCM) and model-based clustering (MCLUST). Among distance based methods, we considered: weighted gene co-expression network analysis (WGCNA), clustering with dynamic time warping distance (DTW) and clustering with autocorrelation based distance (ACF). We studied these algorithms in both simulated settings and case study data. Our investigations showed that FCM performed very well when gene curves were short and sparse. DTW and WGCNA performed well when gene curves were medium or long (>=10>=10 observations). SSC performed very well when there were clusters of gene curves similar to one another. Overall, ACF performed poorly in these applications. In terms of computation time, FCM, SSC and DTW were considerably slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more accurate and biological meaningful clustering results. WGCNA and MCLUST are the best methods among the 6 methods compared, when performance and computation time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides model based inference and uncertainty measure of clustering results

    Fast R Functions for Robust Correlations and Hierarchical Clustering

    Get PDF
    Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA. The hierarchical clustering algorithm implemented in R function hclust is an order n3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n2, leading to substantial time savings when clustering large data sets

    Network module detection: Affinity search technique with the multi-node topological overlap measure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many clustering procedures only allow the user to input a <it>pairwise </it>dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high <it>multi-node </it>topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.</p> <p>Findings</p> <p>We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.</p> <p>Conclusion</p> <p>Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: <url>http://www.genetics.ucla.edu/labs/horvath/MTOM/</url></p

    Multivariate variance-components analysis of longitudinal blood pressure measurements from the Framingham Heart Study

    Get PDF
    Multivariate variance-components analysis provides several advantages over univariate analysis when studying correlated traits. It can test for pleiotropy or (in the longitudinal context) gene × age interaction. It can also have more power than univariate analyses to detect a quantitative trait locus influencing several traits. We apply multivariate variance components to longitudinal systolic blood pressure data from the Framingham Heart Study. We find evidence for a polygenic influence on blood pressure (heritabilities at different ages range from 27% to 38%). Tests based on a factor-analytic parameterization of the polygenic variance find significant (p < 2 × 10(-3)) evidence that different genes affect blood pressure at different ages. Still, estimates for the proportion of polygenic variance due to shared genes ran as high as 85% for some trait pairs. Univariate and multivariate linkage analyses replicate previous linkage results on chromosome 17 (maximum LOD scores of 2.2 and 2.4, respectively). In this study, multivariate analysis provides no increase in power; this is likely due to the strong positive correlation in systolic blood pressure measured at different ages

    Using genetic markers to orient the edges in quantitative trait networks: The NEO software

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems genetic studies have been used to identify genetic loci that affect transcript abundances and clinical traits such as body weight. The pairwise correlations between gene expression traits and/or clinical traits can be used to define undirected trait networks. Several authors have argued that genetic markers (e.g expression quantitative trait loci, eQTLs) can serve as causal anchors for orienting the edges of a trait network. The availability of hundreds of thousands of genetic markers poses new challenges: how to relate (anchor) traits to multiple genetic markers, how to score the genetic evidence in favor of an edge orientation, and how to weigh the information from multiple markers.</p> <p>Results</p> <p>We develop and implement Network Edge Orienting (NEO) methods and software that address the challenges of inferring unconfounded and directed gene networks from microarray-derived gene expression data by integrating mRNA levels with genetic marker data and Structural Equation Model (SEM) comparisons. The NEO software implements several manual and automatic methods for incorporating genetic information to anchor traits. The networks are oriented by considering each edge separately, thus reducing error propagation. To summarize the genetic evidence in favor of a given edge orientation, we propose Local SEM-based Edge Orienting (LEO) scores that compare the fit of several competing causal graphs. SEM fitting indices allow the user to assess local and overall model fit. The NEO software allows the user to carry out a robustness analysis with regard to genetic marker selection. We demonstrate the utility of NEO by recovering known causal relationships in the sterol homeostasis pathway using liver gene expression data from an F2 mouse cross. Further, we use NEO to study the relationship between a disease gene and a biologically important gene co-expression module in liver tissue.</p> <p>Conclusion</p> <p>The NEO software can be used to orient the edges of gene co-expression networks or quantitative trait networks if the edges can be anchored to genetic marker data. R software tutorials, data, and supplementary material can be downloaded from: <url>http://www.genetics.ucla.edu/labs/horvath/aten/NEO</url>.</p

    Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent work has revealed that a core group of transcription factors (TFs) regulates the key characteristics of embryonic stem (ES) cells: pluripotency and self-renewal. Current efforts focus on identifying genes that play important roles in maintaining pluripotency and self-renewal in ES cells and aim to understand the interactions among these genes. To that end, we investigated the use of unsigned and signed network analysis to identify pluripotency and differentiation related genes.</p> <p>Results</p> <p>We show that signed networks provide a better systems level understanding of the regulatory mechanisms of ES cells than unsigned networks, using two independent murine ES cell expression data sets. Specifically, using signed weighted gene co-expression network analysis (WGCNA), we found a pluripotency module and a differentiation module, which are not identified in unsigned networks. We confirmed the importance of these modules by incorporating genome-wide TF binding data for key ES cell regulators. Interestingly, we find that the pluripotency module is enriched with genes related to DNA damage repair and mitochondrial function in addition to transcriptional regulation. Using a connectivity measure of module membership, we not only identify known regulators of ES cells but also show that Mrpl15, Msh6, Nrf1, Nup133, Ppif, Rbpj, Sh3gl2, and Zfp39, among other genes, have important roles in maintaining ES cell pluripotency and self-renewal. We also report highly significant relationships between module membership and epigenetic modifications (histone modifications and promoter CpG methylation status), which are known to play a role in controlling gene expression during ES cell self-renewal and differentiation.</p> <p>Conclusion</p> <p>Our systems biologic re-analysis of gene expression, transcription factor binding, epigenetic and gene ontology data provides a novel integrative view of ES cell biology.</p

    DNA methylation age is accelerated in alcohol dependence.

    Get PDF
    Alcohol dependence (ALC) is a chronic, relapsing disorder that increases the burden of chronic disease and significantly contributes to numerous premature deaths each year. Previous research suggests that chronic, heavy alcohol consumption is associated with differential DNA methylation patterns. In addition, DNA methylation levels at certain CpG sites have been correlated with age. We used an epigenetic clock to investigate the potential role of excessive alcohol consumption in epigenetic aging. We explored this question in five independent cohorts, including DNA methylation data derived from datasets from blood (n = 129, n = 329), liver (n = 92, n = 49), and postmortem prefrontal cortex (n = 46). One blood dataset and one liver tissue dataset of individuals with ALC exhibited positive age acceleration (p &lt; 0.0001 and p = 0.0069, respectively), whereas the other blood and liver tissue datasets both exhibited trends of positive age acceleration that were not significant (p = 0.83 and p = 0.57, respectively). Prefrontal cortex tissue exhibited a trend of negative age acceleration (p = 0.19). These results suggest that excessive alcohol consumption may be associated with epigenetic aging in a tissue-specific manner and warrants further investigation using multiple tissue samples from the same individuals
    corecore